Skip to content

Scope completion truncation to active provider#49

Open
pntech20 wants to merge 1 commit into
WilliamAGH:devfrom
pntech20:codex/provider-scoped-truncation
Open

Scope completion truncation to active provider#49
pntech20 wants to merge 1 commit into
WilliamAGH:devfrom
pntech20:codex/provider-scoped-truncation

Conversation

@pntech20
Copy link
Copy Markdown

@pntech20 pntech20 commented May 17, 2026

Summary

  • scopes completion prompt truncation to the provider selected for the current request attempt
  • moves completion truncation into buildCompletionRequest so fallback providers use their own model limits
  • adds regression coverage for OpenAI gpt-4o alongside the default GitHub Models openai/gpt-5

Fixes #40.

Verification

  • JAVA_HOME=C:\Users\Admin\AppData\Local\Codex\jdks\temurin-25\jdk-25.0.3+9 ./gradlew.bat test --tests com.williamcallahan.javachat.service.OpenAiRequestFactoryTest
  • JAVA_HOME=C:\Users\Admin\AppData\Local\Codex\jdks\temurin-25\jdk-25.0.3+9 ./gradlew.bat test --tests com.williamcallahan.javachat.service.OpenAIStreamingServiceTest
  • JAVA_HOME=C:\Users\Admin\AppData\Local\Codex\jdks\temurin-25\jdk-25.0.3+9 ./gradlew.bat spotlessCheck
  • git diff --check

Full ./gradlew.bat test runs 244 tests but fails on the existing Windows environment because EnvironmentVariablePrecedenceTest hardcodes /bin/bash, which is not present here.

Greptile Summary

This PR fixes issue #40 by moving prompt truncation inside buildCompletionRequest so that each provider attempt uses its own model's token limits, rather than applying a single pre-loop truncation derived from the union of both providers' characteristics.

  • OpenAIStreamingService.complete() now passes the raw prompt to buildCompletionRequest, which calls the private truncatePromptForCompletion(String, String) with the resolved model ID for the active provider.
  • OpenAiRequestFactory gains a public provider-arg overload of truncatePromptForCompletion and a private model-ID overload that contains the actual logic; the original no-arg public method now delegates to the OPENAI provider by default.
  • Two new tests verify that gpt-4o (OpenAI) leaves an ~8K-token prompt untouched while openai/gpt-5 (GitHub Models) truncates it with the appropriate notice.

Confidence Score: 4/5

Safe to merge; the core logic change is correct and well-tested for the targeted scenarios.

The refactor correctly scopes truncation to the active provider on each attempt. Two dead public overloads have no production callers after the change, and the o-series 7K limit assumption silently under-serves larger-context models. Neither issue affects correctness for the currently configured models.

OpenAiRequestFactory.java — the dead public overloads and the o-series token limit assumption are worth a second look before this code path grows further.

Important Files Changed

Filename Overview
src/main/java/com/williamcallahan/javachat/service/OpenAIStreamingService.java Removes pre-loop truncation and passes the raw prompt to buildCompletionRequest; correct and minimal change.
src/main/java/com/williamcallahan/javachat/service/OpenAiRequestFactory.java Moves per-provider truncation into buildCompletionRequest; introduces two public overloads that have no production callers after the refactor (dead API), and the o-series branch still applies GPT-5's 7K limit.
src/test/java/com/williamcallahan/javachat/service/OpenAiRequestFactoryTest.java Adds two tests for the new public provider-arg overload; integration between buildCompletionRequest and truncation for GitHub Models provider is not directly covered.

Sequence Diagram

sequenceDiagram
    participant C as Caller
    participant S as OpenAIStreamingService
    participant F as OpenAiRequestFactory
    participant P as ProviderRoutingService

    C->>S: complete(prompt, temperature)
    S->>P: selectAvailableProviderCandidates(...)
    P-->>S: [providerCandidate, ...]

    loop for each providerCandidate
        S->>F: buildCompletionRequest(prompt, temperature, activeProvider)
        Note over F: normalizedModelId(useGitHubModels)
        F->>F: truncatePromptForCompletion(prompt, modelId)
        Note over F: gpt5Family / reasoningModel check
        F-->>S: ResponseCreateParams (with truncated prompt)
        S->>P: client.responses().create(requestParameters)
        alt success
            P-->>S: Response
            S-->>C: Mono.just(text)
        else RuntimeException
            S->>P: recordProviderFailure(...)
            Note over S: fallback to next provider if eligible
        end
    end
Loading
Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
src/main/java/com/williamcallahan/javachat/service/OpenAiRequestFactory.java:127-129
**Dead public API after this refactor**

The no-arg `truncatePromptForCompletion(String prompt)` no longer has any production caller — its only call site in `OpenAIStreamingService.complete()` was removed by this PR, and `buildCompletionRequest` now invokes the private `truncatePromptForCompletion(String, String)` directly. Keeping the method conflicts with [AB1d] ("Delete unused code instead of keeping it 'just in case'") and [RC1b] ("No compatibility shims that hide defects"). The same applies to the new two-arg public overload at line 138, which is also reachable only from tests — `buildCompletionRequest` bypasses it and calls the private method itself.

Consider removing both public overloads and testing via `buildCompletionRequest` directly, which exercises the full provider-to-model-id path, or make the provider-arg overload the single public entry point.

### Issue 2 of 2
src/main/java/com/williamcallahan/javachat/service/OpenAiRequestFactory.java:149-152
**`o`-series models receive GPT-5's 7K token limit regardless of context window**

`canonicalModelName(modelId).startsWith("o")` captures `o1`, `o3`, `o3-mini`, etc. and routes them to `MAX_TOKENS_GPT5_INPUT` (7 000 tokens). Many o-series models expose far larger context windows and this mismatch silently truncates prompts that would fit. This was also true before the PR, but the refactor now makes this path the single authoritative one for both providers, so the blast radius is wider. At minimum the assumption should be documented; at best, o-series models should have their own named constant and explicit limit.

Reviews (1): Last reviewed commit: "Scope completion truncation to active pr..." | Re-trigger Greptile

Greptile also left 2 inline comments on this PR.

Context used:

  • Context used - AGENTS.md (source)
  • Context used - CLAUDE.md (source)

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 17, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

This PR refactors prompt truncation in the OpenAI completion flow to use only the selected model's token limits instead of applying the most restrictive limit across all configured providers. The truncation API now accepts a provider parameter, the internal logic is simplified to check only the resolved model, and callers delegate truncation responsibility to the request factory.

Changes

Provider-Aware Prompt Truncation

Layer / File(s) Summary
Prompt truncation API refactoring
src/main/java/com/williamcallahan/javachat/service/OpenAiRequestFactory.java
New public overload truncatePromptForCompletion(String prompt, RateLimitService.ApiProvider provider) resolves the model for the given provider and applies truncation. The original single-argument method delegates to this new overload with OPENAI as default. Core truncation logic now derives gpt5Family and reasoningModel from only the selected modelId, removing prior aggregation across both configured providers.
Truncation integration in request factory
src/main/java/com/williamcallahan/javachat/service/OpenAiRequestFactory.java
buildCompletionRequest now truncates the completion prompt internally using the provider-specific overload before constructing ResponseCreateParams, shifting responsibility from the caller.
Streaming service caller update
src/main/java/com/williamcallahan/javachat/service/OpenAIStreamingService.java
complete(...) removes local prompt truncation and passes the original prompt to buildCompletionRequest, which now handles truncation based on the resolved model.
Prompt truncation test coverage
src/test/java/com/williamcallahan/javachat/service/OpenAiRequestFactoryTest.java
Two new unit tests verify that OPENAI provider applies no truncation (respecting higher OpenAI limits) and GITHUB_MODELS provider truncates to the 8K GPT-5 input limit with appropriate notice.

🎯 2 (Simple) | ⏱️ ~12 minutes

🎯 From limits mixed with care,
A prompt now flows without compare—
Each model gets its perfect slot,
No more context lost to what it's not! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Scope completion truncation to active provider' clearly summarizes the main change of limiting prompt truncation to the currently selected provider.
Linked Issues check ✅ Passed The changes directly address issue #40 by moving truncation logic into buildCompletionRequest to scope it to the resolved provider, matching the recommended fix and expected behavior.
Out of Scope Changes check ✅ Passed All code changes are tightly scoped to addressing the provider-specific truncation bug: modifying OpenAIStreamingService, enhancing OpenAiRequestFactory with a new overload, and adding targeted regression tests.
Description check ✅ Passed The PR description clearly relates to the changeset, explaining the truncation scope changes and referencing the fixes being made.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Comment on lines 127 to +129
public String truncatePromptForCompletion(String prompt) {
return truncatePromptForCompletion(prompt, RateLimitService.ApiProvider.OPENAI);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Dead public API after this refactor

The no-arg truncatePromptForCompletion(String prompt) no longer has any production caller — its only call site in OpenAIStreamingService.complete() was removed by this PR, and buildCompletionRequest now invokes the private truncatePromptForCompletion(String, String) directly. Keeping the method conflicts with [AB1d] ("Delete unused code instead of keeping it 'just in case'") and [RC1b] ("No compatibility shims that hide defects"). The same applies to the new two-arg public overload at line 138, which is also reachable only from tests — buildCompletionRequest bypasses it and calls the private method itself.

Consider removing both public overloads and testing via buildCompletionRequest directly, which exercises the full provider-to-model-id path, or make the provider-arg overload the single public entry point.

Context Used: AGENTS.md (source)

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/main/java/com/williamcallahan/javachat/service/OpenAiRequestFactory.java
Line: 127-129

Comment:
**Dead public API after this refactor**

The no-arg `truncatePromptForCompletion(String prompt)` no longer has any production caller — its only call site in `OpenAIStreamingService.complete()` was removed by this PR, and `buildCompletionRequest` now invokes the private `truncatePromptForCompletion(String, String)` directly. Keeping the method conflicts with [AB1d] ("Delete unused code instead of keeping it 'just in case'") and [RC1b] ("No compatibility shims that hide defects"). The same applies to the new two-arg public overload at line 138, which is also reachable only from tests — `buildCompletionRequest` bypasses it and calls the private method itself.

Consider removing both public overloads and testing via `buildCompletionRequest` directly, which exercises the full provider-to-model-id path, or make the provider-arg overload the single public entry point.

**Context Used:** AGENTS.md ([source](https://app.greptile.com/review/custom-context?memory=c73518f6-94f2-4eb4-a597-3be5ff49a896))

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +149 to 152
boolean gpt5Family = isGpt5Family(modelId);
boolean reasoningModel = gpt5Family || canonicalModelName(modelId).startsWith("o");

int tokenLimit = reasoningModel ? MAX_TOKENS_GPT5_INPUT : MAX_TOKENS_DEFAULT_INPUT;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 o-series models receive GPT-5's 7K token limit regardless of context window

canonicalModelName(modelId).startsWith("o") captures o1, o3, o3-mini, etc. and routes them to MAX_TOKENS_GPT5_INPUT (7 000 tokens). Many o-series models expose far larger context windows and this mismatch silently truncates prompts that would fit. This was also true before the PR, but the refactor now makes this path the single authoritative one for both providers, so the blast radius is wider. At minimum the assumption should be documented; at best, o-series models should have their own named constant and explicit limit.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/main/java/com/williamcallahan/javachat/service/OpenAiRequestFactory.java
Line: 149-152

Comment:
**`o`-series models receive GPT-5's 7K token limit regardless of context window**

`canonicalModelName(modelId).startsWith("o")` captures `o1`, `o3`, `o3-mini`, etc. and routes them to `MAX_TOKENS_GPT5_INPUT` (7 000 tokens). Many o-series models expose far larger context windows and this mismatch silently truncates prompts that would fit. This was also true before the PR, but the refactor now makes this path the single authoritative one for both providers, so the blast radius is wider. At minimum the assumption should be documented; at best, o-series models should have their own named constant and explicit limit.

How can I resolve this? If you propose a fix, please make it concise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Detail Bug] Prompt truncation uses limits from unused provider, forcing 7K cap on high-context models

1 participant